Soft Similarity and Soft Cosine Measure: Similarity of Features in Vector Space Model

نویسندگان

  • Grigori Sidorov
  • Alexander F. Gelbukh
  • Helena Gómez-Adorno
  • David Pinto
چکیده

We show how to consider similarity between features for calculation of similarity of objects in the Vec­ tor Space Model (VSM) for machine learning algorithms and other classes of methods that involve similarity be­ tween objects. Unlike LSA, we assume that similarity between features is known (say, from a synonym dictio­ nary) and does not need to be learned from the data. We call the proposed similarity measure soft similarity. Similarity between features is common, for example, in natural language processing: words, n-grams, or syn­ tactic n-grams can be somewhat different (which makes them different features) but still have much in common: for example, words “play” and “game” are different but related. When there is no similarity between features then our soft similarity measure is equal to the standard similarity. For this, we generalize the well-known cosine similarity measure in VSM by introducing what we call “soft cosine measure”. We propose various formulas for exact or approximate calculation of the soft cosine measure. For example, in one of them we consider for VSM a new feature space consisting of pairs of the original features weighted by their similarity. Again, for features that bear no similarity to each other, our formulas reduce to the standard cosine measure. Our experiments show that our soft cosine measure provides better performance in our case study: entrance exams question answering task at CLEF. In these experiments, we use syntactic n-grams as features and Levenshtein distance as the similarity between n-grams, measured either in characters or in elements of n-grams.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

New distance and similarity measures for hesitant fuzzy soft sets

The hesitant fuzzy soft set (HFSS), as a combination of hesitant fuzzy and soft sets, is regarded as a useful tool for dealing with the uncertainty and ambiguity of real-world problems. In HFSSs, each element is defined in terms of several parameters with arbitrary membership degrees. In addition, distance and similarity measures are considered as the important tools in different areas such as ...

متن کامل

Single-valued neutrosophic similarity measures based on cotangent function and their application in the fault diagnosis of steam turbine

Similarity measure is an important tool in pattern recognition and fault diagnosis. This paper proposes two cotangent similarity measures for single-valued neutrosophic sets (SVNSs) based on cotangent function. Then, the weighted cotangent similarity measures are introduced by considering the importance of each element. Moreover, by the comparison between the cotangent similaritymeasures of SVN...

متن کامل

SOME SIMILARITY MEASURES FOR PICTURE FUZZY SETS AND THEIR APPLICATIONS

In this work, we shall present some novel process to measure the similarity between picture fuzzy sets. Firstly, we adopt the concept of intuitionistic fuzzy sets, interval-valued intuitionistic fuzzy sets and picture fuzzy sets. Secondly, we develop some similarity measures between picture fuzzy sets, such as, cosine similarity measure, weighted cosine similarity measure, set-theoretic similar...

متن کامل

SRIUBC-Core: Multiword Soft Similarity Models for Textual Similarity

In this year’s Semantic Textual Similarity evaluation, we explore the contribution of models that provide soft similarity scores across spans of multiple words, over the previous year’s system. To this end, we explored the use of neural probabilistic language models and a TF-IDF weighted variant of Explicit Semantic Analysis. The neural language model systems used vector representations of indi...

متن کامل

ORDERED INTUITIONISTIC FUZZY SOFT MODEL OF FLOOD ALARM

A flood warning system is a non-structural measure for flood mitigation. Several parameters are responsible for flood related disasters. This work  illustrates an ordered intuitionistic fuzzy analysis that has the capability to simulate the unknown relations between a set of meteorological and hydrological parameters. In this paper, we first define ordered intuitionistic fuzzy soft sets and est...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computación y Sistemas

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2014